Skip to content

Introduce follower replication#249

Open
Fullstop000 wants to merge 2 commits intotikv:masterfrom
Fullstop000:follower_replication
Open

Introduce follower replication#249
Fullstop000 wants to merge 2 commits intotikv:masterfrom
Fullstop000:follower_replication

Conversation

@Fullstop000
Copy link
Copy Markdown
Member

@Fullstop000 Fullstop000 commented Jul 1, 2019

#136

New features

  • Add some new fields in Message
  • New concepts Group and Delegate in the cluster topology
  • A message now could be sent to a proxy and the proxy redirects the message to the destination

TODOs

  • The leader can specify the Delegate of a Group
  • The Delegate can bcast_append or send_append to the other members in the Group
  • Stable group delegate
  • Scenarios based testing

Problems

The ConfChange protocol for raft groups might need to be discussed. Solved as GroupConfig is now volatile.

The implementation has been ported to etcd: etcd-io/etcd#11455.

@Fullstop000 Fullstop000 changed the title [WIP] Introduce follower replication Introduce follower replication Aug 27, 2019
@Fullstop000
Copy link
Copy Markdown
Member Author

I've finished the main part of this PR. But I'm not sure the best way to implement the ConfChange for altering raft groups dynamically so I want a discussion with your guys.

PTAL @siddontang @hicqu @BusyJay @Hoverbear

@Hoverbear
Copy link
Copy Markdown
Contributor

@Fullstop000 Can you please include links to any design docs you worked on?

@Fullstop000
Copy link
Copy Markdown
Member Author

@Hoverbear Can I make a WIP PR in tikv/rfc ?

@Hoverbear
Copy link
Copy Markdown
Contributor

@Fullstop000 I think it's OK. :) I'd still really appreciate seeing an RFC (like https://docs.google.com/document/d/1Sp9Tnc_nk_i0feTOgLGPT3jZYVFjfP7C6pwA-wuVHko/edit?ts=5c8c5fe3 which you worked on)

@Fullstop000 Fullstop000 force-pushed the follower_replication branch 4 times, most recently from ae47fee to 3502119 Compare October 29, 2019 09:45
@Hoverbear Hoverbear added the Feature Related to a major feature. label Nov 7, 2019
@Hoverbear
Copy link
Copy Markdown
Contributor

@hicqu PTAL

Comment thread src/raft_log.rs
Comment thread src/raft.rs Outdated
Comment thread src/raft.rs Outdated
Comment thread src/raft.rs Outdated
Comment thread src/group/mod.rs Outdated
Comment thread src/group/mod.rs Outdated

/// The max number of members of a group of which contains leader and the leader never pick a delegate.
/// If the group size is larger than this, a delegate will be picked even the leader belongs to this group.
pub max_leader_group_no_delegate: usize,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I prefer to remove this, because without explicit information about group_id, follower replication may be not helpful. For example 7 peers in one datacenter, follower replication can't save any network traffics.

Comment thread src/group/mod.rs Outdated
Comment thread src/raft.rs Outdated
// Also, A peer with smaller 'match' is able to receive more un-compacted entries from leader
// and then send them to others in same group.
//
fn pick_delegate(&self, group: &[u64], prs: &ProgressSet) -> u64 {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems in the new implementation, choose the peer with most raft logs is better?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems more reasonable. Actually maybe we can choose any unpaused peer?

Comment thread src/raft.rs Outdated
pub msgs: Vec<Message>,

/// The especial map for messages to be sent to group delegates.
pub delegated_msgs: HashMap<u64, Message>,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to put it with groups

Comment thread src/raft.rs Outdated
}

// Whether the given peer could use Follower Replication
fn use_delegate(&self, to: u64) -> bool {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer

fn use_delegate(&self, to: u64) -> bool {
    self.is_leader() &&
        self.groups.get_members(to).map_or(0, |members| members.len()) > 1 &&
        !self.groups.in_same_group(self.id, to)
}

And we need to make sure that peers with invalid group_id shouldn't exist in groups.

Comment thread src/raft.rs Outdated
/// of produced message should be set to the leader id.
pub fn send_append(&mut self, to: u64, prs: &mut ProgressSet) {
if self.use_delegate(to) {
if let Some(gid) = self.groups.get_group_id(to) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too many indents. Maybe we can define use_delegate(to) -> Option<group_id>.

Comment thread src/raft.rs Outdated
None => self.pick_delegate(&members, prs),
};
if delegate != INVALID_ID {
if let Some(pr) = prs.get_mut(delegate) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think calling unwrap is better, because the progress must exist.

Comment thread src/raft.rs Outdated
}
None => {
let members = self.groups.get_members(to).cloned().unwrap(); // this is safe because of the checking in `self.use_delegate`
let delegate = match self.groups.get_delegate(gid) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer let delegate = self.pick_delegate(prs), which already contains get_delegate logic. And, its valid only if the pr is not paused.

Comment thread src/raft.rs Outdated
*maybe_commit = true;
}

fn handle_append_response_in_delegate(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems only difference between this and handle_append_response is the latter needs to handle leader transferring. That logic is compatible for delegates. So I think this function is unnecessary.

Comment thread src/raft.rs Outdated
self.set_prs(prs);
} else {
let members = self.groups.get_members(m.from_delegate).cloned();
self.bcast_append(members.as_ref());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bcast_append calls send_append, which checks members' delegate again. I believe it can be improved, maybe we can take a try?

Comment thread src/raft.rs Outdated
self.read_states.push(rs);
}
MessageType::MsgAppendResponse => {
if self.prs().get(m.from).is_none() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a common case for receiving responses from removed peers. So I think the log is unnecessary. Without it the code could be cleaner.

Comment thread src/raft.rs Outdated
//
// The delegate must satisfy conditions below:
// 1. Must be 'recent_active'
// 2. The progress state should be 'Replicate' but not 'paused'
Copy link
Copy Markdown
Contributor

@hicqu hicqu Dec 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems Probe needs also to be allowed? I think calling is_paused here is fine.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probe might not be suitable for a delegate in the leader's view because the Probe member can probably just recover from a crash or is suffering the network partition.

Comment thread src/raft.rs Outdated
}

#[inline]
fn is_follower_replication_enabled(&self) -> bool {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's not accurate. Every peer can't know follower replication is enabled or not. It just choose a strategy after receives followers' messages.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add some comments to make this clear.

Comment thread harness/src/interface.rs Outdated
Comment thread harness/src/network.rs
Comment thread harness/tests/integration_cases/test_raft_follower_replication.rs Outdated
Comment thread src/raft.rs Outdated
Comment thread src/raft.rs Outdated
Comment thread src/raft.rs Outdated
Comment thread src/raft.rs Outdated
Comment thread src/raft.rs Outdated
Comment thread src/raft.rs Outdated
Comment thread src/raft.rs Outdated
Comment thread src/raft.rs Outdated
Comment thread src/progress/progress_set.rs Outdated
hicqu
hicqu previously approved these changes Dec 17, 2019
@Fullstop000 Fullstop000 force-pushed the follower_replication branch 2 times, most recently from 88f583e to f89086c Compare February 25, 2020 08:17
Signed-off-by: Fullstop000 <fullstop1005@gmail.com>
@Fullstop000 Fullstop000 force-pushed the follower_replication branch 2 times, most recently from 88f583e to 1da637c Compare February 25, 2020 08:31
Signed-off-by: Fullstop000 <fullstop1005@gmail.com>
@TennyZhuang
Copy link
Copy Markdown

Any updates?

@Xuanwo
Copy link
Copy Markdown
Member

Xuanwo commented Aug 24, 2021

This PR is discontinued for so long, any updates here? Do we have a plan to get it merged?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Feature Related to a major feature.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants